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Abstract 

We  use  process  level  large  deviation  analysis  to  obtain  the  rate 
function  for  a  general  family  of  occupancy  problems.  Our  interest 
is  the  asymptotics  of  the  empirical  distributions  of  various  quantities 
(such  as  the  fraction  of  urns  that  contain  a  given  number  of  balls) .  In 
the  general  setting,  balls  are  allowed  to  land  in  a  given  urn  depending 
on  the  urn’s  contents  prior  to  the  throw.  We  discuss  a  parametric 
family  of  statistical  models  which  includes  Maxwell- Boltzmann,  Bose- 
Einstein  and  Fermi-Dirac  statistics  as  special  cases.  A  process  level 
large  deviation  analysis  is  conducted  and  the  rate  function  for  the 
original  problem  is  then  characterized,  via  the  contraction  principle, 
by  the  solution  to  a  calculus  of  variations  problem.  We  conjecture 
that  the  solution  to  the  variational  problem  coincides  with  that  of  a 
finite  dimensional  minimization  problem. 


1  Introduction 

Occupancy  problems  center  on  the  distribution  of  r  balls  that  have  been 
thrown  into  n  urns.  In  the  simplest  scenario  each  ball  is  equally  likely  to 
land  in  any  of  the  urns,  i.e.,  each  ball  is  independently  assigned  to  a  given  urn 
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with  probability  1/n.  In  this  case,  we  say  that  the  urn  model  uses  Maxwell- 
Boltzmann  (MB)  statistics.  This  model  has  been  studied  for  decades  and 
applied  in  diverse  fields  such  as  computer  science,  biology,  and  statistics. 
See  [2,  6,  7]  and  the  references  therein.  However,  balls  may  also  enter  the 
urns  in  a  nonunifornr  way.  An  important  generalization  is  to  allow  the 
likelihood  that  the  ball  lands  in  a  given  urn  to  depend  on  its  contents  prior 
to  the  throw,  as  in  Bose-Einstein  (BE)  and  Fermi-Dirac  (FD)  statistics. 
See  [7,  4,  9]  and  the  references  therein. 

For  MB  statistics,  many  results  have  been  obtained  using  “exact”  meth¬ 
ods.  For  example,  combinatorial  methods  are  used  in  [5]  and  methods  that 
use  generating  functions  are  discussed  in  [7] .  Although  they  do  not  directly 
involve  approximations,  the  implementation  of  these  methods  can  be  dif¬ 
ficult.  For  example,  in  combinatorial  methods  one  has  to  deal  with  the 
difference  of  events  using  the  inclusion-exclusion  formula  and  the  resulting 
computations  can  involve  large  errors.  In  the  moment  generating  function 
approach  in  [7]  similar  difficulties  occur. 

Large  deviations  approximations  give  an  attractive  alternative  to  both 
of  these  approaches.  One  reason  is  that  they  offer  good  approximations  with 
just  modest  computation.  A  second,  perhaps  more  important  reason,  is  that 
qualitative  insights  can  be  obtained.  In  [8]  the  LDP  for  the  MB  model  is 
obtained,  and  the  rate  function  exhibited  in  more-or-less  explicit  form. 

In  the  present  paper,  we  discuss  a  parametric  family  of  statistical  models, 
of  which  the  previously  mentioned  MB,  BE  and  FD  statistics  are  all  special 
cases.  We  assume  there  are  n  urns  and  that  \Tn\  balls  are  thrown  into 
them  (where  [sj  denotes  the  integer  part  of  s),  and  analyze  the  asymptotic 
properties  as  n  goes  to  oo.  A  typical  problem  of  interest  is  to  characterize 
the  large  deviation  asymptotics  of  the  empirical  distribution  after  all  the 
balls  are  thrown.  For  example,  one  can  wish  to  estimate  the  probability 
that  at  most  half  of  the  urns  are  empty  after  all  the  balls  are  thrown.  A 
direct  analysis  of  this  problem  is  hard,  and  instead  we  lift  the  problem  to 
the  process  level  and  analyze  the  large  deviation  asymptotics  at  this  process 
level.  Once  the  process  level  large  deviation  analysis  is  done,  one  can  apply 
the  Contraction  Mapping  Theorem  to  answer  the  original  question.  We 
conjecture  that  the  variational  problem  that  results  from  the  contractions 
principle  can  in  fact  be  solved  explicitly  (as  was  done  in  [8]  for  MB),  and 
the  formula  is  stated  in  Section  6. 

Although  process  level  large  deviations  are  by  now  quite  standard,  there 
are  several  interesting  features,  both  qualitative  and  technical,  which  distin¬ 
guish  occupancy  models  and  place  them  outside  the  range  of  existing  theory. 
The  most  significant  of  these  as  far  as  the  proof  is  concerned  are  the  singular 
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transition  rates  that  occur  in  the  (Markovian)  process  level  description  of 
the  model.  We  will  use  a  weak  convergence  approach  that  is  naturally  suited 
to  these  problems  and  results  in  a  nicely  compact  and  self  contained  proof, 
and  one  that  can  easily  accommodate  further  generalization  of  the  model. 
A  second  very  interesting  feature  is  the  previously  mentioned  possibility  for 
explicit  solutions  to  the  variational  problems  that  arise  in  the  process  level 
approximations. 

The  outline  of  the  paper  is  as  follows.  In  Section  2  the  parametric  family 
of  occupancy  problem  is  described  in  detail.  A  dynamical  system  charac¬ 
terization  of  the  random  occupancy  process  is  given,  and  a  representation 
for  certain  exponential  integrals  is  given  in  terms  of  a  “controlled”  occu¬ 
pancy  process.  From  this  representation  formula  one  can  identify  the  large 
deviation  rate  function  immediately.  In  Section  3  we  prove  the  lower  bound 
for  the  Laplace  principle,  which  corresponds  to  the  large  deviation  upper 
bound.  In  Section  4,  the  rate  function  /  is  studied  more  closely  so  as  to 
deal  the  technical  difficulty  of  the  singular  transition  rates.  In  Section  5,  we 
prove  the  upper  bound  for  the  Laplace  principle  which  corresponds  to  the 
large  deviation  lower  bound.  Finally,  in  Section  6  we  conjecture  a  simplified 
formula  of  the  rate  function  for  the  process  at  a  given  fixed  time. 

2  Preliminaries  and  Main  Result 

In  this  section,  we  formulate  the  problem  of  interest  and  state  the  LDP.  The 
proof  is  given  in  sections  that  follow.  As  described  in  the  introduction,  we 
focus  on  the  asymptotic  behavior  of  the  general  occupancy  problem. 

The  general  occupancy  problem  has  the  same  structure  as  the  Maxwell- 
Boltzmann  occupancy  problem,  except  that  in  the  general  problem  urns  are 
distinguished  according  to  the  number  of  balls  contained  therein.  The  full 
collection  of  models  will  be  indexed  by  a  parameter  a.  This  parameter  takes 
values  in  the  set  (0,  oo]  U  {  —  1,  —  2, . . .},  and  its  interpretation  is  as  follows. 
Suppose  that  a  ball  is  about  to  be  thrown  into  a  fixed  set  of  urns,  and  that 
any  two  urns  (labeled  say  A  and  B )  are  selected.  An  urn  is  said  to  be  of 
category  i  if  it  contains  i  balls.  Suppose  that  urn  A  is  of  category  i.  while 
B  is  of  category  j .  Then  the  probability  that  the  ball  is  thrown  into  urn 
A,  conditioned  on  the  state  of  all  the  urns  and  that  the  ball  is  thrown  into 
either  urn  A  or  B,  is 

a  +  i 

(a  +  i)  +  ( a  +  j )‘ 

When  a  =  oo  we  interpret  this  to  mean  that  the  two  urns  are  equally  likely. 


3 


Also,  when  a  <  0  we  use  this  ratio  to  define  the  probabilities  only  when 
0  <  i  V  j  <  — a  and  i  <  —a  or  j  <  —a,  so  the  formula  gives  a  well  defined 
probability.  The  probability  that  a  ball  is  placed  in  an  urn  of  category  —a 
is  0.  Thus  under  this  model,  urns  can  only  be  of  category  0, 1, ...  —  a,  and 
we  only  throw  balls  into  categories  0,1,...  —  a  —  1. 

In  this  setup,  certain  special  cases  are  distinguished.  The  cases  a  =  1, 
a  =  oo,  a  =  —I  correspond  to  what  are  called  Bose-Einstein  statistics, 
Maxwell- Boltzmann  statistics,  and  Fermi-Dirac  statistics,  respectively. 

Suppose  that  before  we  throw  a  ball  there  are  already  tn  balls  in  all 
the  urns,  and  further  suppose  that  the  occupancy  state  is  (xq,x\,  . .  .a ;/+). 
Here  xi,  i  =  0, . . . ,  I  denotes  the  fraction  of  urns  that  contain  i  balls,  and  xi+ 
denotes  the  fraction  containing  more  than  I  balls.  Then  the  “un-normalized” 
or  “relative”  probability  of  throwing  into  a  category  i  urn  with  i  <  I  is 
simply  (a  +  Let  us  temporarily  abuse  notation,  and  let  xj+\ ,  xi+ 2, . . . 
denote  the  exact  fraction  in  each  category  i  with  i  >  I.  Since  there  are  tn 
balls  in  the  urns  before  we  throw,  =  ^  Thus  the  (normalized  and 

true)  probability  that  the  ball  is  placed  in  an  urn  that  contains  exactly  i 
balls,  i  =  0, . . ./,  is  and  the  probability  that  the  ball  is  placed  in  an 

urn  that  has  more  than  I  balls  is  1  —  J2j=o  E^t  x3- 

An  explicit  construction  of  this  process  is  as  follows.  To  simplify,  we 
assume  the  empty  initial  condition,  i.e.,  all  urns  are  empty.  One  can  consider 
other  initial  conditions,  with  only  simple  notational  changes  in  the  results 
to  be  stated  below.  We  introduce  a  time  variable  t  that  ranges  from  0  to 
T.  At  a  time  t  that  is  of  the  form  l/n,  with  0  <  l  <  [nT\  an  integer, 
l  balls  have  been  thrown.  Let  Xn(t)  =  {Xff (t),  Xf(i), . . . Xf(t),  Xf+(t)} 
be  the  occupancy  state  at  that  time.  As  noted  previously,  Xf(t)  denotes 
the  fraction  of  urns  that  contain  i  balls  at  time  t,  i  =  0, . . ./,  and  Xf,(t) 
the  fraction  of  urns  that  contain  more  than  I  balls.  The  definition  of  Xn  is 
extended  to  all  t  E  [0,  T]  not  of  the  form  l /n  by  piecewise  linear  interpolation. 
Note  that  Xn(t)  is  indeed  a  probability  vector  in  M/+2.  If 

f  r+\ 

Sj  =  <  x  E  M/+2  :xj>0,0<i</+l  and  ^  xt  =  1 

l  i= 0 

then  for  any  t  G  [0,  T] ,  Xn(t)  E  Si.  Thus  Xn  takes  values  in  U  =  C  ([0,  T] ,  Si) 
We  equip  U  with  the  usual  supremum  norm  and  on  Si  we  take  the  usual  L\ 
norm. 

It  will  be  convenient  to  work  with  the  following  “dynamical  system” 
representation.  For  x  E  M/+2  and  t  E  [0,  —  al{a<o}  +  °ol{a>0})  define  the 
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vector  p(t,  x)  G  RI+2  by 

CL  I  /j) 

Pk(t,x)  =  — — -xk,  for  k  =  0, . . (2.1) 

a  +  t 


and 


\  i  \ ^  cl  k 
Pi+i{t,  x)  =  1  -  >  — — Xfc. 

'  a  + 1 

k= o 

A  direct  calculation  shows  that  if 

i+ 1 

x  &  Si  and  ^  kxk  <  t ,  (2.2) 

k=0 

then  p(i,  x)  is  indeed  a  probability  vector  in  M/+2,  i.e. ,  p(t,  x)  €  Si.  We  can 
then  define  a  family  of  independent  random  vector  fields 

{yi,n(-)  ■  i  =  0,1, . . .  [nT\  —  1,  \nT\ } 

that  take  values  in 


A  =  {ei+i  -  ej,  0  <  j  <  1}  U  {0} 
and  with  distributions 

p =  v}  =  {  ,  “  "  =  0<k<I 

Finally,  we  define  Xn(l/n )  recursively  by 

*n  ((*  +  l)/n)  =  (i/n)  +  (A"  (*/n)) , 

n 

and  the  initial  condition  Xn(0)  =  (1, 0, .  .  .0).  Observe  that  the  increments 
{y*,n  ( Xn  (i/n))}  are  conditionally  distributed  according  to  p  (^,  Xn{i/n )), 
and  thus  the  process  Xn  is  obviously  Markovian  and  will  have  the  same 
distribution  as  the  occupancy  process  described  previously. 

Often  one  is  interested  in  the  large  deviations  of  the  empirical  occupancy 
measure  at  the  terminal  time  T,  namely  Xn[T).  We  study  this  by  analyzing 
the  large  deviation  properties  of  the  whole  process  Xn  and  then  using  the 
Contraction  Mapping  Theorem.  The  Laplace  formulation  will  be  used  to 
perform  the  process  level  analysis.  Let  F  be  any  bounded  and  continuous 
function  on  U.  The  processes  Xn  are  said  to  satisfy  the  Laplace  principle 
with  rate  function  I  if  the  following  two  conditions  hold: 
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1.  For  each  K  <  oo,  the  set  {p  G  U  :  I(<p)  <  K}  is  compact  in  U. 

2. 

lim  —  —  log-Eexp  [— nF(Xn)\  =  inf  [I(p)  +  F(p)\  . 

B-*  00  n  ipeu 

Since  Xn  takes  values  in  a  Polish  space,  the  notions  of  Laplace  principle  and 
large  deviation  principle  are  equivalent  [3,  Theorem  1.2.1]  . 

Define  the  I  +  2  by  I  +  2  matrix 


/-I 

0 

0 

0 

0 

0  \ 

[  1 

-1 

0 

0 

0 

0 

0  • 

1 

-1 

0 

0 

0 

•  •  0 

•  •  0 

0 

1 

-1 

0 

V  0 

0 

0  •  •  • 

0 

1 

0  / 

Let  p  G  U  be  given  with  v?o(0)  =  1.  Suppose  there  is  a  Borel  measurable 
function  9  :  [0,  T]  1 — >  Sj  such  that  for  any  t  G  [0,  T] 

p(t)  =  p( 0)  +  f  M9(s)ds.  (2.3) 

Jo 

We  interpret  0j(s)  as  the  rate  at  which  balls  are  thrown  into  urns  that  contain 
i  balls  at  time  s.  Moreover  6(s)  is  unique  in  the  sense  that  if  another  9  :  [0,  T] 

1 — >  Sj  satisfies  (2.3)  then  6  =  9  a.e.  on  [0,  T] .  We  call  p  a  valid  occupancy 
state  process  if  there  exists  9  :  [0,T]  1 — >  Sj  satisfying  (2.3).  In  this  case  9 
is  called  the  occupancy  rate  process  associated  with  cp.  It  is  easy  to  observe 
that  if  p  is  valid  then  p(s)  satisfies  (2.2)  for  all  s  G  [0,  T].  This  shows  that 
P(s,<p(s))  G  5/. 

The  relative  entropy  function  will  be  used  throughout  the  paper  and  we 
define  it  now.  For  two  probability  measures  a  and  f3  on  a  Polish  space  A, 
the  relative  entropy  of  a  with  respect  to  (3  is  defined  by 

Mam  =  JA  (log  9)  da 

whenever  a  is  absolutely  continuous  with  respect  to  /3  (and  with  the  con¬ 
vention  that  OlogO  =  0).  In  all  other  cases  we  set  R(a\\f3)  =  00.  When  two 
probability  vectors  p  and  v  G  Sj  appear  in  the  relative  entropy  function,  we 
interpret  them  as  probability  measures  on  the  simplex  {0,1, ...,/, 7+1}, 
and  thus 

i+\ 

R(p\W)  =  ^pdog  —  . 

a — n  ^ 
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As  observed  before,  when  cp(s)  is  valid,  p(s,ip(s))  G  Si,  which  makes 
.R(0(s)||p(s,  <^(s)))  well  defined.  For  such  p  define 

I(<p)=  [  R(0(s)\\p(s,<p(s)))ds.  (2.4) 

J  o 

If  p  is  not  valid  then  define  I(p)  =  oo.  In  the  next  three  sections  we  will 
prove  the  urn  models  constructed  in  this  section  satisfy  the  Laplace  principle 
with  rate  function  I.  In  particular,  in  Section  3  we  will  prove 

liminf  —  —  log  E  exp  [— nF(Xn)\  >  inf  [I(<p)  +  F(p)]  , 

n— »  oo  n  (peU 

and  in  Section  5  we  will  prove 

limsup - logEexp  [— nF(Xn)]  <  inf  [I(<p)  +  F((p)]  . 

n — >oo  n 

These  bounds  are  equivalent  to  the  large  deviation  upper  and  lower  bound 
[3].  In  Section  4,  we  will  prove  several  properties  of  the  rate  function  I,  and 
in  particular  show  that  I  has  compact  level  sets. 

It  will  turn  out  that  certain  representation  formulas  for  exponential  in¬ 
tegrals  simplify  proving  the  Laplace  principle.  Consider  a  controlled  process 
Xn(t )  constructed  as  follows.  The  process  dynamics  are  of  the  same  general 
structure  as  those  of  Xn ,  save  that  yl,n  ( Xn  ( i/n ))  is  replaced  by  a  sequence 
of  controlled  random  vectors  yl'n .  Let  (V,  A)  be  a  measurable  space  and  y 
a  Polish  space  and  let  r(dy\x)  be  a  family  of  probability  measures  on  y 
parameterized  by  x  £  V.  We  call  r(dy\x)  a  stochastic  kernel  on  y  given  V  if 
for  every  Borel  subset  E  of  y  the  function  mapping  Vh  t(E\x)  G  [0, 1] 
is  measurable.  The  conditional  distributions  of  the  controlled  random  vec¬ 
tors  will  be  specified  by  a  sequence  {zAn  :  i  =  0, 1, . . .,  \  nT\ },  where  each 
quantity  vl'n  =  z/,n  (, x$ ,  x\,  X2,  ■  ■  •£«)  is  interpreted  as  a  stochastic  kernel  on 
A  given  (5/)*+1.  We  call  such  a  sequence  {z/,n  :  *  =  0, 1, . . .,  \nT\ }  an  ad¬ 
missible  control  sequence.  Each  control  vl,n  will  give  rise  to  a  corresponding 
relative  entropy  term  in  the  representation  formula. 

The  controlled  process  is  determined  by 

Xn((i  +  l)/n)  =  Xn  (i/n)  + -f’n  for  z  =  0,1,...  InTl 

n 

A”(0)  =  (1,0,... 0), 

where  yl,n  has  the  conditional  distribution  isz,n  (An  (0) , . . .,  Xn  (i/n)).  The 
random  vectors  Xn  [i /n)  and  yl,n  are  defined  recursively  in  the  following 
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order: 


Xn(0),f’n,  Xn(l/n),y1’n,  Xn(2  /n), . . .,  Xn{\nT\  /n),y  LnTJ’n. 

For  all  n  G  N  the  controlled  random  vectors  Xn(i/n )  and  yl,n  are  defined 
on  a  common  probability  space  (Q,X,P)  ,  and  expectation  on  this  space  is 
denoted  by  E.  Define 

pl'n  =  p(i/n,Xn(i/n))  , 

where  p  (t,  x)  was  defined  previously  in  (2.1)  .  Then  by  [3,  Proposition  1.4.2] 
and  the  chain  rule  for  relative  entropy  [3,  Theorem  C.3.1] 

1  LnTJ 

F  (*’*)  +  - ii^”)  • 

i= 0 

(2-5) 

where  the  infimum  is  over  all  the  admissible  control  sequences  jz/,n  j. 


- logFiexp  [— nF(Xn)}  =  inf  E 

n 


3  The  Large  Deviation  Upper  Bound 


In  this  section,  we  prove 

lirninf  —  —  log-Fexp  f— nF(Xn)\  >  inf  \I((p)  +  F((p)\ , 

«— >  oo  n 

which  corresponds  to  the  large  deviation  upper  bound.  By  (2.5)  it  is  enough 
to  show  that 


lim  inf  inf  E 

n— >oo  jj/,™  j 


,  l»rj 

F(Xn)  +  -Y,R(vi'n\\p‘-n) 


>  inf  [I(<P)  +  H<P)] 

(feu 


For  0  <  l  <  [nT\  and  t  e  [l/n,  l/n+  1/n),  define 

Xn{t)  =  Xn(l/n). 


Thus  Xn  is  the  piecewise  constant  interpolation  of  the  occupancy  process. 
Note  that  for  all  uj 


sup 

te[o,T] 


Xn{t) 


Xn{t) 


Therefore  if  Xn  converges  weakly  to  X,  then  also  Xn  converges  weakly  to 
X. 


We  do  the  same  thing  for  the  controlled  measures,  and  for  t  £  [l/n,  l/n+ 
1/n)  set 

un(t)  =  ul'n  and  pn(t )  =  /£>(-,  Xn(i) 

\n 

Note  that  because  relative  entropy  is  nonnegative  and  ( \_nT J  +  1)  /n  >  T, 

n 

i= 0 

For  an  5/— valued  process  |y(t)}  we  define  (y)  as  its  indefinite  integral, 

i.e., 

(y)(t)=  f  y(s)ds,  for  0  <  t  <  T. 

J  o 

Then  (y)  can  be  viewed  as  a  vector  of  sub-cumulative  distribution  functions 
taking  values  in  the  space 

Q  =  {(y),  y  :  [0,  T]  ^  Sj  is  measurable}  . 

We  consider  Q  as  a  subset  of  C([0,  T],  M/+2)  with  the  inherited  topology. 
Since  each  component  of  (y)  is  Lipschitz  continuous  (with  Lipschitz  con¬ 
stant  1),  the  Arzela-Ascoli  Theorem  implies  that  Q  is  precompact.  Choose 
a  convergent  sequence  ( yn },  with  limit  z.  Since  each  component  of  z  in¬ 
herits  the  Lipschitz  continuity  and  nronotonicity  of  the  (yn),  a  vector  of 
non-negative  derivatives  y  exist,  and  for  any  0  <  s  <  t  <  T  these  derivatives 
satisfy 

rt  ^+1  ^  +1 

/  y ^yj(u)du  =  ^2[zi(t)  -  Zi(s )]  =  t  —  s. 

•'s  i=  1  i= 1 

This  implies  that  y  e  Sj  for  almost  every  t  G  [0,  T],  and  hence  z  £  Q.  We 
conclude  that  Q  is  compact. 

Let  m  <g>  y  be  the  vector  of  sub-probability  measures  generated  by  (y), 
i.e.,  for  each  0  <  i  <  I  +  1  and  0  <  a  <  T 

{m®  y)i((-oo,a\)  =  — 

Each  component  of  m  <8>  y  can  be  viewed  as  taking  values  in  the  space  of 
sub-probability  measures  with  the  topology  of  weak  convergence,  and  then 
m  <8>  y  as  taking  values  in  the  product  space  with  corresponding  product 
topology.  However,  we  can  also  consider  m  <8>  y  as  a  probability  measure 


R(vn(t)\\p'\t))dt. 


(3.1) 
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on  [0,  T]  x  {0, 1, .../,/  +  1},  with  the  topology  of  weak  convergence  on  this 
space.  These  two  topologies  are  clearly  equivalent,  and  since  uniform  conver¬ 
gence  of  sub-cunrulative  distribution  functions  implies  the  weak  convergence 
of  the  corresponding  sub-probability  measures,  the  mapping  ( y }  i— ►  m®  y  is 
continuous.  The  Continuous  Mapping  Theorem  then  implies  the  following 
result. 

Lemma  3.1.  LetYn  andY  be  Si -valued  random  processes.  If  (Yn)  converges 
weakly  to  (Y)  then  m  <S>  Yn  converges  weakly  to  m®Y . 

We  will  also  need  conditions  under  which  ( Yn }  will  converge  weakly  to 
(Y).  Define  V  =  V([0,T\  :  Si)  to  be  the  space  of  functions  that  map  [0,T] 
into  Si,  are  right  continuous,  and  have  left-hand  limits.  Note  that  U  C  V. 

We  equip  V  with  the  standard  Skorohod  metric  s(-,  •)  so  that  (V,  s)  is  a 
Polish  space  (cf.  [1]).  If  yn  £  V,  s(yn,y )  — >  0,  and  if  y  £  U,  then  in 
fact  yn{t)  — >  y(t)  uniformly  in  t  £  [0,T],  and  hence  ( yn )  — >  (y).  Another 
application  of  the  Continuous  Mapping  Theorem  gives  the  following. 

Lemma  3.2.  Suppose  a  sequence  of  V -valued  processes  Yn  converges  weakly 
to  the  U -valued  process  Y.  Then  (Yn)  converges  weakly  to  (Y),  and  hence 
m  <8>  Yn  converges  weakly  to  m®Y . 

We  will  also  need  the  following  formula,  which  can  be  verified  directly 
from  the  given  definitions: 

R(vn(t)\\pn(t))  dt  =  TR  (to  ®  z>n||m  <g>  pn) .  (3.2) 

Next  we  will  prove  the  key  weak  convergence  theorem  used  in  the  process 
level  analysis. 

Theorem  3.3.  Define  a  sequence  of  controlled  processes  and  controls  (Xn(t),  un(t )) 
as  above.  Then 

{{Xn,(vn)),ne  N} 

is  tight.  For  any  sequence  from  { (Xn,  (/>”))  ,  n  £  N}  consider  a  further  sub¬ 
sequence  that  converges  in  distribution  to  (X,  rf) .  Then  the  limit  processes 
have  the  following  properties: 

1.  There  exists  an  Si— valued  process  6  so  that  r/  =  (9)  w.p.l. 

2.  The  process  X  is  a  valid  occupancy  process  and  the  process  9  is  the 
occupancy  rate  process  associated  with  X,  i.e.,  w.p.l. 

X(t)  =  A(0)  +  f  M9(s)ds  for  all  t  £  [0,T] . 

Jo 
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Proof.  Both  Xn  and  (z>n)  are  uniformly  (in  n  and  lo)  Lipschitz  continu¬ 
ous.  Hence  by  the  Arzela-Ascoli  Theorem,  { ( Xn ,  ( un  »  ,  n  G  N}  is  tight. 
Let  (A,  rj)  denote  the  weak  limit  of  a  convergent  subsequence.  Since  the 
second  component  takes  values  in  Q  and  this  space  is  compact,  there  exists 
a  measurable  5/-valued  process  Opt)  so  that  r/  =  (0). 

Notice  that  for  each  0  <  i  <  I  +  1  and  0  <  l  <  [riT J ,  and  with  the 
notational  conventions  ej+ 2  =  ej+i  and  z>_i(t)  =  0, 


A" 


l  +  1 


n 


-  Af 


n  ^  {?/’"=ei  -e<_i }  n  ^  {yl’n  =ei+i  —e% } 


=  -K 


n 


2—1 


n 


+  -Y? 

n 


(3.3) 


with  (-A)  implicitly  defined  by 


In  the  same  way  that  we  defined  z>n,  An  on  the  whole  [0,  T ]  by  piecewise  con¬ 
stant  interpolation,  we  can  also  define  Yn  (t)  on  [0,  T\.  Let  Xp  be  the  natural 
filtration,  i.e.,  the  a— algebra  generated  by  {An(0),  An(l/n), . . . ,  Xn(l/n)J  . 
Then 


E 


1  r.-. 


{?/i,n=e,_e,_i||A;  0-1 


which  shows  that  j Y/1  (^)|,  0  <  l  <  \nT\  is  a  martingale  difference  with 
respect  to  Xp. 

We  have  observed  that  (n)  :  0  <  A:  <  \nT\  j  is  a  martingale 

r  i  1 2 

with  respect  to  Xp.  It  is  also  easy  to  see  that  E  Y ]n  (^)  =  0(1).  Summing 

(3.3)  shows  that  for  any  0  <  l  <  \nT\ 


Xf  (l/n)  -  XP  (0)  =  (l/n)  -  W)  ( l/n )  +  (Y?)  (Z/n) . 


Owing  to  the  fact  that  the  jumps  in  the  discrete  time  processes  are  uni¬ 
formly  bounded,  if  t  £  [7,^7-)  for  some  0  <  l  <  \nT\  then  (z>n)  (t)  = 
(z>n)  (l/n)  +  0(l/n )  and  (Tn)  (f)  =  (Tn)  (Z/n)  +  0(l/n),  where  the  0(l/n) 
term  does  not  depend  on  oj.  Since  the  Lipschitz  continuity  of  Xn  implies 
\Xn  ( t )  —  Xn  (s)  |  <  \t  —  s|,  for  any  s,  t  G  [0,  T] 

Af  (t)  -  XP  (0)  =  (A)  -  (*>">  (t)  +  &">  (A)  +  9?(t). 
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where  gf{t)  converges  to  0  uniformly  in  t  and  in  u Recalling  that  |l^n  (”) 

0  <  l  <  \nT\  is  a  martingale  difference  and  E  Y, f1)2  =  0(l/n),  by  a 

standard  martingale  inequality 

( Y fi)  ( t )  — >  0  uniformly  for  t  €  [0,  T],  io.p.1. 

Since  (1 V, fi)  (t)+gf(t)  converges  to  0  w.p.l  and  (Xf  (t)  -  Xf  (0)  ,  (z>|Li)  (t)  -  (z>f )  (t)) 
converges  weakly  to  (X*  (t)  -  X;  (0)  ,  %_i  (t)  -  77*  (t)), 

Xj  (t)  -  Xj  (0)  =  rji-i  (t)  -  ra  (t)  w.p.l. 

Recall  that  we  have  proved  the  existence  of  a  process  9  so  that  77(f)  = 

Jq  9(s)ds.  Thus  the  last  display  can  be  rewritten 

Xj(f)  -  Xj(0)  =  (0)i-i(t)  -  (0)i(t)  tc.p.l, 


which  is  indeed 


X(i)  =  X(0)  +  /  M9(s)ds  w.p.l. 


□ 


Theorem  3.4.  Define  I  by  (2-4)  for  any  of  the  occupancy  models  discussed 
in  Section  2.  If  F  :  U  M  is  bounded  and  continuous,  then 

liminf  —  —  logEexp  [— nF(Xn)\  >  inf  [I(p)  +  F(ip)] . 

n—>oo  n  <f£U 

Proof.  Owning  to  the  representation  formula  (2.5)  it  is  enough  to  show 
that 


lim  inf  inf  E 

n— >00 


>  inf  [I(<p)  +  H<p)]- 

(feu 


Consider  any  admissible  sequence  jzd’nj.  Then  (3.1)  and  (3.2)  imply 


E 


[nT] 


i=0 


>  E 


F{Xn)+  R(un(t)\\pn(t))  dt 
Jo 

=  E  [F  (Xn)  +  TR(m  ®  un\\m®  pn)]  . 


12 


By  Theorem  3.3  we  know  for  any  subsequence  of  N,  there  exists  a  subsubse¬ 
quence  such  that  |  ^ Xn ,  ( vn j  converges  in  distribution  to  a  limit  (A,  (9)). 
Let 

f  /+1 

W  =  <  x  G  V  :  ^  ix(t)  <t,  for  0  <  t  <  T 

[  i=  1 

Due  to  our  construction  of  the  controlled  process  An,  we  know  that  for  each 
l t>,  Xn(u)  €  W.  For  a  6  (0,  oo)  U  {—1,  —2, . . .}  and  x  G  W  define  g(x)  by 

=  ^7^(0.  for  0  <  i  <  I 
a  +  t 

and 

(0(®))/+i(*)  =  1  - 

Ua+t 

Then  g  maps  W  to  V.  The  case  a  =  oo  is  defined  as  the  obvious  limit. 
When  a  €  (0,oo]  g  is  clearly  bounded  and  continuous.  When  a  <  0  the 
boundedness  of  g  is  not  as  trivial  but  still  elementary.  We  know  that  when 
a  <  0,  balls  are  only  thrown  among  the  categories  0, 1, . . .,  —a  —  1.  Thus 
if  there  are  n  urns  there  can  at  most  be  — an  balls  thrown,  and  therefore 
T  <  —a.  When  T  =  —a  all  the  urns  have  exactly  —a  balls,  which  is  not  an 
interesting  case  to  study.  We  therefore  assume  T  <  —a.  Also,  because  of  the 
same  restriction  on  the  possible  categories  we  can  (without  loss)  Therefore 


— i  —  a 
<  - < 


—a 


—t  —  a  —a  —  T 


a 

a  +  T' 


which  shows  that  g  is  bounded.  The  argument  to  show  continuity  is  similar 
and  omitted. 

With  these  definitions  we  have  g  (t)  =  pn{t)  and  g(X)(t)  =  p(t,X(t)). 

Since  Xn,X  €  W,  we  have  pn  and  p(-,X(-))  €  V.  By  the  Continuous  Map¬ 
ping  Theorem  and  the  definition  of  pn ,  weak  convergence  of  Xn  to  X  implies 
weak  convergence  of  pn  to  p(t,X(t))  in  V.  Applying  Lemma  3.2,  we  have 
that  to  <8>  pn  converges  weakly  to  to  <8>  p-  Similarly,  by  Lemma  3.1  the  weak 
convergence  of  ( vn )  to  (9)  implies  the  weak  convergence  of  m®vn  to  m®9. 
Now  applying  Fatou’s  Lemma  (for  weak  convergence)  and  using  the  lower 
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semicontinuity  of  relative  entropy, 


lim  inf  E 

n— >oo 


[nTJ 


F(Xn)  +  -YJR(''>'n\\P>'n) 


i=0 


>  lim  inf  E\F  (Xn)  +  TR{m <g>  z>n  I \m  <g>  pn)l 

n— >0 O  L  \  /  J 

>  f?  [F  (A)  +  TR(m  <g>  9\\m  <g>  p)\ 

=  e\f(X)+  [  R(6(t)\\p(t,X(t)))dt  . 


(3.4) 


As  proved  in  Theorem  3.3,  X(t)  =  A(0)  +  J*  M9(s)ds,  therefore  by  the 
definition  (2.4)  of  the  rate  function  /(<£>), 

[T  R(d(t)\\p(t,X(t)))dt  =  I(X). 


Thus  (3.4)  yields 


lim  inf  inf  E 

n— >°o 


[nT\ 


i= 0 


>  inf  [I(lf)  +  F((f)}  ■ 


Hence  we  complete  the  proof  of  the  large  deviation  upper  bound. 


□ 


4  Properties  of  the  Rate  Function. 

In  this  section,  we  will  prove  some  important  properties  of  the  rate  function, 
some  of  which  will  be  used  later  on  to  prove  the  large  deviation  lower  bound. 

Theorem  4.1.  Let  I  be  defined  as  in  (2-4)-  Then  for  any  K  e  [0,  oo)  the 
level  set  {tp  £U  :  I(ip)  <  K}  is  compact. 

Proof.  As  is  always  the  case  in  the  weak  convergence  approach,  the  proof  of 
compactness  of  level  sets  is  essentially  a  deterministic  analogue  of  the  proof 
of  the  large  deviation  upper  bound,  and  hence  omitted.  See  [3,  Proposition 
6.2.4]  for  the  proof  in  an  analogous  situation.  □ 

Theorem  4.2  (Zero  Cost  Trajectory).  Fort  G  [0,T]  let  f(t)  =  (l  +  ^)  “ 

(f{t)  =  in  the  case  a  =  oo), 

Mt)  =  p/W(f)  forO<i<  I, 
and  let  fii+i(t )  =  1  —  J2i=o  Then  I((j))  =  0. 
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Proof.  We  first  assume  a  /  oo.  It  is  easy  to  see  that  for  any  0  <  i  <  oo 


f#/(i»w>0  and  g;fc2>(t)  =  l,  (4,1) 

'  i=0  Z’ 

Thus  <f  as  defined  in  the  statement  of  the  theorem  is  indeed  a  probability 
vector.  It  is  also  clearly  a  continuously  differentiable  function.  We  will  show 
that 

<t>(t)  =  Mp{t,4>{t)).  (4.2) 

If  so,  then  the  occupancy  rate  process  6  associated  to  cf  is  indeed  p(t,  (j>(t)), 
and  thus  by  the  definition  of  rate  function 

I{4>)  =  [  R(Q(t)\\p(t,(j>(t)))dt  =  Q. 

Jo 

To  show  (4.2)  we  calculate  f>i(t)  =  t^-/W(f)  for  0  <  i  <  I  explicitly: 


f  n}=o(a  +  j)  A  +  t\ 

i\  a*  \  a) 


Hence  the  derivative  satisfies 

•  .  ,  a  +  i  —  1  ..  a  +  i  ,  .  . 

a+t  a+ t 

=  Pi-i(t,  4>{t))  -  pi(t,  4>(t)) 

=  (Mp(t,(/>(t))i, 


where  the  second  equality  is  due  to  the  definition  of  p(t,  <f>(t)).  The  case  of 
(j>i+i(t)  is  also  a  straightforward  calculation  and  hence  omitted. 

Next  we  consider  the  case  when  a  =  oo.  In  this  case  /(f)  =  e_t,  and  the 
validity  of  (4.2)  can  be  directly  verified.  □ 


Lemma  4.3.  For  every  choice  of  the  parameter  a  there  exist  6  >  0  and 
0  <  K  <  oo  so  that  the  zero  cost  trajectory  < f>(t)  is  away  from  the  boundary 
of  Si,  i.e., 

<j>i(t)  >  6tK  (4.3) 


for  any  0  <  i  <  I  +  1. 
Proof.  Note  that  when 

(fiit) 


o>0,  0  <  i  <  I  and  0  <  t  <T, 

f  nS(a  +  j)  /  |  A-- 

i\  a1  \  a  J 


(4.4) 
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and  because  of  (4.1)  we  have 


<t>i+i(t)  =  1  - 


(4.5) 


,  £§/<-(*> 

tI+1  (  T 

~  (7+ijT  V1  +  o 


—a—I—1 


(4.6) 


For  the  case  a  <  0  we  have  T  <  —a  and  a  <  —I  —  1.  Recall  that  for 
0  <i<I 

r*-l , 


<M*)  =  T 


f  n;=0 («+j)  ^  ,  ^-a-* 


i  + 


Since  a  +  /  <  —1,  a  +  j  <  —  1  for  each  0  <  j  <  I,  and  thus 

,  ,  ,  f  1  /  t 

*(f)  -  (‘  +  a 

Moreover  since  a  <  0  and  —a  — I  >  0,  for  each  fixed  a  and  i  <  I,  (l  +  ^) 
is  monotone  decreasing  in  t.  Therefore 

Lastly,  since  T  <  —a  and  a  <  0,  0  <  1  +  T/a  <  1.  Thus  (l  +  °  ‘  is 

monotone  increasing  in  i,  and  therefore 


*<‘>4  (-;)  (1  +  ? 


For  4>i+i  (t)  we  have 

/ 

4>i+i{t)  =  1  -^2<j>i(t) 


_  ti+i  nj=0(Q+j) 


> 


t/+i 

a+i)! 


aI+1 

V  1 

1\I+1  ( 

T 

— 

1  +  - 

a)  V 

a 
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The  last  inequality  follows  exactly  the  same  reasoning  as  for  0  <  i  <  I. 

Finally,  for  the  case  a  =  oo  we  just  take  limits  on  (4.4)  and  (4.5),  and 
use  that 

(  T\~a~I  _T 
lim  (  1  H -  =  e  1  . 

a^oo  y  a  ) 

If  follows  that  for  every  choice  of  the  parameter  a  there  exist  5  >  0  and 
0  <  K  <  oo  so  that  the  zero  cost  trajectory  < j>{t)  satisfies  (4.3)  for  any 
0<i<  J+l.  □ 

Lemma  4.4.  For  a  given  value  of  a  let  the  parameters  5  and  K  be  as  in 
(4-3).  Let  (p  6  Z4  satisfy  I(p)  <  oo.  Then  for  any  e  >  0  there  exists  p£  £  Z4 
such  that 

1.  I{p£)  <  I(p), 

2.  d{ip,  ip£)  <  e, 

3 .  >  eStK  for  all  t  €  [0,  T]  and  i  =  0, 1, .../,/+  1. 

Proof.  For  any  e  >  0  and  ip  £ld,  let 

ip£  =  (1  -e)(p  +  e<p, 

where  (p  is  the  zero  cost  trajectory.  Then  p£  e  U.  From  the  definition 
of  p(t,x)  in  (2.1)  it  follows  that  p(t,x)  is  linear  in  x.  Also,  recalling  the 
definition  of  I(p)  in  (2.4)  and  the  joint  convexity  of  relative  entropy,  we  find 
that  I{p)  is  convex  in  p.  Therefore 

I{T£)  <  (1  -e)l(<p)  +  el(<p) 

=  (l-e)I(<p) 

Since  d  (p,  (p)  <  1 

d{p,pe)  <  ed(p,(p)  <  e, 

and  also  p£  >  e(p  >  e5tK .  □ 

The  final  theorem  of  this  section  is  essential  in  proving  the  large  deviation 
lower  bound. 

Definition  4.5  (Good  Path).  We  call  an  occupancy  process  p  G  U  a 
“good  path'’  if  there  exist  constants  0  <  6',  K'  <  oo  so  that  pi(t)  >  5'tK'  for 
t  G  [0,  T]  and  0  <  i  <  I  +  1. 
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Definition  4.6  (Good  Control).  We  call  an  occupancy  rate  process  8  a 
“ good  control”  if  the  process  8  is  piecewise  constant  on  [0,T],  with  a  finite 
number  of  intervals  of  constancy.  In  other  words,  there  exist  a  finite  number 
of  intervals  [r*^,  s*],  1  <  i  <  m  so  that  [0,T]  =  U ^[r*,  Sj],  and  8(t)  is  a 
constant  vector  on  each  (rt,  sf) .  In  addition,  we  assume  there  exists  0  < 
a  <  T  so  that  8  is  Upure”  on  [0,<r);  in  the  sense  that  for  any  interval  of 
constancy  (r,  s)  C  [0,  a),  there  exists  i,0  <  i  <  I  +  1  such  that  Sfit)  =  1  for 
t  E  (r,  s ). 

Theorem  4.7.  For  a  good  path  ip  6  U  assume  I(p)  <  oo.  Let  5 K'  be  the 
associated  constants  in  the  definition  of  a  good  path.  For  any  £  >  0  there 
exists  a  good  control  8*  and  associated  a  >  0  so  that  if  <p*  is  the  occupancy 
path  associated  to  8* ,  then 

L  I{f*)  <  I{p)  +  £, 

2.  d(p*,p)  <  e, 

3.  if  t  <  a  and  8*{t )  =  1  then  p*{t)  >  5'aK' . 

Proof.  For  a  a  >  0  that  will  be  specified  later  on,  we  construct  a  pure 
control  8*(t) ,  t  G  [0,  a)  as  follows.  For  0  <  i  <  I  let  9*(t)  =  1  if 

i  7+1  i  7+1 

ilPk{cr)  <t  <^2jpj{a)+  ^2  (i+l)pk{a), 
j= 0  k=i+ 1  j= 0  k=i+ 1 

and  let  8j+l(t)  =  1  if 


'YhwM)  +  {I  +  l)pi+i(cr)  <t<a. 

i= o 

Observe  that  the  component  pi  will  increase  only  during  the  interval  when 
8*_i(t)  =  1,  and  that  it  decreases  to  its  final  value  while  9*(t)  =  1.  Observe 
also  that  p*(a)  =  ip(a).  Hence  for  t  <  a,  if  8*  fit)  =  1  then  <p*(t)  >  p*{cr)  > 
5'aK' . 

Now  assume  that  0  <  a  <  oo.  For  such  i  and  t, 


Pi{t,p*{t))  = 


a  +  t 


> 


IK1 


a  +  T 

X'lK1 
=  0  <7  , 


da 
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(4.7) 


where  5 "  is  defined  as  Note  that  the  above  bound  is  true  for  all 

0  <i  <1+1. 

Recall  that  when  a  <  0  we  can  assume  without  loss  that  a  +  1  +  I  <  0, 
and  that  no  balls  are  placed  in  urns  that  currently  contain  more  than  I  balls. 
It  follows  that 

i+ 1 
j= o 

and  that  0/+i(t)  is  always  zero.  Hence  the  same  will  be  true  of  0J+1(t),  i.e. , 
=  0  for  all  t  €  [0,  a}.  For  0  <  i  <  I,  we  have 

a  +  t 

>  - 0  (T 

a  +  t 

>  ^±±5’ok' 

a 

^  1  cl  K' 

a 

Thus  for  such  i  there  exists  a  constant  5"  >  0  so  that  pi{t ,  +*{t))  >  5"ah' 
when  9*{t)  =  1. 

Finally,  when  a  =  oo  we  can  choose  5"  =  5'  and  (4.7)  will  hold. 

This  completes  the  construction  of  6*  and  ip*  on  [0,  a).  The  lower  bounds 
on  the  pi  and  the  fact  that  8*  is  pure  on  [0,  a]  imply 

J  R{d*{t)\\p{t,ip*(t)))dt  <  —a log  ^5"aK'^  . 

Now  let  us  choose  a  small  enough  so  that  R  (9*(t)\ \p(t,  <p*(t)))  dt  <  e/2 
and  supjgjo  ^.)  |  +*{t)  —  v?(t)|  <  e.  Also,  recall  that  under  the  construction 

<P*(<r)  =  V(°)- 

The  construction  of  controls  on  [u,  T]  is  easier.  Let  9(t)  be  the  rate 
process  associated  with  ip(t)  by  (2.3).  For  M  G  N  we  partition  [cr,T]  into 
M  subintervals  of  length  cm  =  (T  —  a)  /M.  For  each  s  that  a  +  Icm  <  s  < 
a  +  (l  +  1)  cm  where  0  <  l  <  (M  —  1),  let 

....  r+(l+1)cM  9{t)dt 

q(M)  _  •'  'T  !  Ic  \  j _ V  7 

CM 

Let  be  the  occupancy  path  associated  with  9^M\t).  Then  it  is  easy 

to  check  that  < p(M\t )  coincides  with  <p(t)  on  the  “partition  points”  in  [cr,  T] , 
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i.e. ,  those  points  of  the  form  {a  +  Icm  :  0  <  l  <  (M  —  1)}  .  Thus  for  M  large 
enough  (e.g.,  M  >  (T  —  a)  /e),  supte[(T)T]  \pM){t)  -  T’W]  <  e- 

Because  ip(t)  is  good,  when  t  >  a,  we  have  (p(t)  >  S'tK  >  5'ah  >  0. 
Therefore  <p(t)  is  uniformly  bounded  away  from  the  boundary  after  time 
a.  As  M  — ►  oo,  0(M\t)  converges  to  6(t)  and  <p(M\t)  converges  to  <p(t)  a.e., 
and  thus  by  the  Lebesgue  Dominated  Convergence  Theorem 

lirn  f  R(0M(t)\\p(t\ipM(t)))dt=  [  R{d{t)\\p(t\(p{t)))dt. 

M—>oo  Ja  Ja 

Now  choose  M  <  oo  large  enough  so  that  fj  R  (0M (t)\\p(t\<pM(t)))  dt  < 
fT  R(Q(t)\\p(t\<p(t)))dt  +  e/2.  Let  6*  be  defined  as  it  was  previously  on 
[0,  <r],  and  set  it  equal  to  0M  on  [a,  T\.  We  have 

!&*)  =  [  R{6M(t)\\ p{t\<pM (t)))  dt  +  f  R(9*(t)\\p(t\ip*(t)))dt 

Ja  JO 

<  f  R(9(t)\\p(t\<p(t)))dt  +  £/2  +  £/2 

J  (T 

<  I(<P)  +  £- 

Thus  we  complete  the  proof.  □ 


5  The  Large  Deviation  Lower  Bound 


Theorem  5.1.  Define  I  by  (2-4)  for  any  of  the  occupancy  models  discussed 
in  Section  2.  If  F  :  U  R  is  bounded  and  continuous,  then 

limsup - log  T? exp  [— nF(Xn)]  <  inf  [I(<p)  +  F((p)] . 

n — xx)  ^  <p€.L( 


Proof.  According  to  (2.5),  the  theorem  follows  if 


lim  sup  inf  E 

n— >oo  {v1’71} 


,  InT} 

FA")  +  -Ei?C’”HX) 


<  inf  [I(<P)  +  F(<P)]  ■ 
(peu 


For  any  ip  e  U  such  that  I((p)  <  oo,  Lemma  4.4  and  Theorem  4.7  imply 
that  for  any  e  >  0  there  exists  (ip*,  6*)  with  the  properties  described  in 
Theorem  4.7.  Since  F  is  continuous  in  U,  we  only  need  to  show  that  there 
exists  a  sequence  of  admissible  controls  {vn}  so  that 


lim  sup  E 

n— xx) 


,  V^T\ 

F(xn)  +  -ER(^nw  phn) 


<J(p*) +  *V). 
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The  latter  inequality  will  follow  if  we  can  find  a  sequence  of  admissible  {z/n} 
such  that 

r  i 


lim  sup  E 

n— kx) 


i= 0 


<  W), 


(5.1) 


and  such  that  if  Xn  is  the  occupancy  process  constructed  under  {Vn}  then 
for  any  small  b  >  0 


lim  sup  P  jd  ( Xn ,  99*)  >  6}  =  0.  (5-2) 

n— >00 

In  other  words,  Xn  converges  to  ip*  in  probability. 

To  prove  the  desired  inequalities  (5.1)  and  (5.2)  we  need  to  construct 
the  proper  {Vn} .  Recall  that  {z/1}  can  depend  in  any  measurable  way  on 
the  “past,”  and  so  we  could,  in  principle,  use  such  information  in  construct¬ 
ing  the  controls.  However,  it  turns  out  that  we  can  construct  the  controls 
without  reference  to  the  controlled  process  (so-called  “open  loop”  controls). 
Let  9*  be  the  good  control  as  described  in  Theorem  4.7.  We  know  that  9* 
is  piecewise  constant  and  pure  up  to  time  a  >  0.  We  also  know  that  be¬ 
fore  time  <7,  if  9*{t)  =  1  then  both  pi(t ,  and  are  greater  than  a 

fixed  value  (  >  0.  We  can  also  assume  for  the  same  value  of  (  that  both 
Pi(t ,  p*{t))  and  p*(t)  are  greater  than  (  for  all  i  £  [0, 1, .../,/  +  1]  and  all 
t  £  [a,  T\. 

Although  the  limit  trajectory  stays  away  from  the  boundary  after  time 
<7,  there  is  no  guarantee  that  the  random  process  Xn  is  uniformly  bounded 
away.  In  order  to  handle  this  possibility,  we  use  a  stopping  time  argument 
similar  to  one  used  in  [8] . 

Let  ( ln/n )  be  the  minimum  of  the  first  time  such  that  for  some  i,  Xf(ln/n)  < 
C/2  and  9*{ln/n )  >  0,  and  the  fixed  deterministic  time  T.  This  is  the  first 
time  the  random  process  is  close  to  the  boundary,  and  hence  there  is  the  pos¬ 
sibility  of  a  large  contribution  to  the  total  cost  [note  that  when  9*(ln/n )  =  0 
there  is  no  contribution  to  the  cost  regardless  of  the  value  of  Xf(ln/n)}.  The 
control  {vn}  is  then  defined  by 

j,i,n  __  f  9*(i/n)_  if  i<ln 

\  p  (i/n,  Xn(i/n))  if  i  >  ln. 

Prior  to  the  stopping  time,  we  use  exactly  what  9*  suggests,  and  after  the 
stopping  time  we  follow  the  law  of  large  number  trajectory  (and  therefore 
incur  no  additional  cost). 

Now  we  apply  Theorem  3.3.  Thus  given  any  subsequence  we  have  con¬ 
vergence  along  a  further  subsequence  as  indicated  in  the  theorem,  with  limit 
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(X,  [0])  .  Using  a  standard  argument  by  contradiction,  it  will  be  enough  to 
prove  (5.1)  and  (5.2)  for  this  convergent  processes.  Let  rn  =  ( ln/n )  <  T. 
Note  that  because  the  applied  controls  are  pure,  the  process  Xn{t )  is  de¬ 
terministic  prior  to  a,  and  also  that  prior  to  this  time,  the  time  derivatives 
of  Xn(t)  and  (p*(t)  are  piecewise  constant.  In  fact,  the  two  derivatives  are 
identical  except  possibly  on  a  bounded  number  of  intervals  each  of  length 
less  than  1  /n  (the  points  where  they  may  disagree  are  all  located  with  within 
distance  1/n  of  the  endpoints  of  the  intervals  of  constancy  of  <p*(t)).  Thus 
for  large  n  we  cannot  have  rn  <  a.  Since  the  range  of  rn  is  a  bounded  set  in 
M,  we  can  also  assume  rn  converges  in  distribution  to  a  limit  r,  and  without 
loss  we  assume  the  convergence  is  along  the  same  subsequence.  Since  rn  >  a 
for  large  n  we  have  r  >  a  w.p.l.  It  is  easy  to  check  that  the  limit  control 
processes  w.p.l  satisfies 


m 


9*(t)  if  t  <  t 
p  (t,  X(t))  if  t  >  t 


Owing  to  the  definition  of  rn,  if  r  <  T  then  Xj(r)  <  (/2  for  some  i  G 
[0, 1, 1]  (although  tp*(t)  >  C  when  t  G  [a,  T]). 

We  use  that  9(t)  =  9*{t )  when  t  <  r  and  that  9*(t)  is  deterministic. 
As  shown  in  Theorem  3.3,  (X,  0*)  satisfies  (2.3)  for  t  G  [0,r].  Thus  for 
t  G  [0,  r] . 

X(t)  =  <p*(t)  w.p.l. 

This  forms  a  contradiction  since 


Xi(r)  <  C/2  <  (  <  <p*(r). 

Therefore  r  =  T,  and  thus  for  all  t  G  [0,  T] 

X(t)  =  <p*(t)  w.p.l. 

This  also  indicates  that  the  weak  limit  of  the  random  processes  is  indeed 
limit  (<£>*,  9*),  which  implies  (5.2).  To  prove  (5.1),  we  use  the  weak  conver¬ 
gence  and  the  Dominated  Convergence  Theorem: 

'  InT} 

-  V  R(^\\r) 

i= 0 

=  [T  R(9*(t)Mt,<p*m 

Jo 

=  W). 

This  completes  the  proof.  □ 


lim  sup  E 

n— >oo 
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6  Explicit  Formula  for  the  LDP  of  the  Process  at 
a  Given  Time 


In  the  previous  sections  we  have  identified  the  large  deviation  rate  function 
(2.4)  for  a  class  of  occupancy  problems.  The  large  deviation  principle  for  the 
process  at  a  given  fixed  time  can  then  be  expressed  in  terms  of  the  solution  to 
a  calculus  of  variations  problem.  In  this  section  we  state  a  conjecture  on  the 
solution  to  this  problem.  The  explicit  formula  is  analogous  to  one  obtained 
in  [8]  for  the  case  of  MB  statistics,  and  it  is  possible  that  the  techniques 
developed  there  could  be  used  here  as  well.  At  the  present  time,  however, 
we  prefer  to  simply  state  the  result  as  a  conjecture  in  order  to  pursue  a 
potentially  more  general  approach  that  would  include  such  generalizations 
as  urn  models  with  balls  of  different  types. 

Since  the  Maxwell-Boltzmann  case  is  rigorously  analyzed  in  [8] ,  we  also 
assume  a  <  oo  (the  formal  statement  can  of  course  be  obtained  as  a  limit). 
By  the  Contraction  Mapping  Theorem,  the  large  deviation  rate  function  for 
an  ending  point  u  is  given  by 


J(v)  =  inf  I  (ip). 

ip£C([0:T]:Si),<p(T)=u}. 


Define 


for  all  i  E  N,  and  also 


i- 1 

(«)t = n(»+;) 

3=0 


i+- 

a 


—a—i 


for  all  i  E  N  and  x  E  [0,  T\. 

Denote  nk  =  {irfi ,  irk , . . . ,  7r^}  E  Moo  for  all  0  <  k  <  I  +  1,  where  irk 
represents  the  probability  of  throwing  i  additional  balls  into  the  fcth  category. 
Denote  it  =  (7 r°,  7T1, . . .,  n1 ,  nI+1),  so  that  n  G  For  any  given  a  G  Si, 

we  say  n  =  (77°,  7 r1, . . . ,  7 r7, 7r/+1)  G  T(a,  u,  T )  if 


and 


^7r}  =  l  0  <  k  <  I  +  1, 

j= 0 


7+1  00 


i 

Vi  =  ^2  ak^i-k  0  <i<  I. 
k= 1 
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With  the  above  notation,  we  conjecture  that  in  the  case  of  empty  initial 
conditions 

J{u)  =  mini?  (7r°||Qa(T))  , 

7T° 

where  7r°  satisfies 

OO  OO 

=  Ui  i  =  0, =  1>  and  ^  =  T. 

i= 0  i= 0 

Moreover  for  a  general  initial  condition  a, 


J(a,u) 


i+ 1 

min  >  akR  (  7rfc 
k=o 


Q 


a+k 


a  +  kr 
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